众所周知,很难拥有一个可靠且强大的框架来将多代理深入强化学习算法与实用的多机器人应用联系起来。为了填补这一空白,我们为称为MultiroBolearn1的多机器人系统提出并构建了一个开源框架。该框架构建了统一的模拟和现实应用程序设置。它旨在提供标准的,易于使用的模拟方案,也可以轻松地将其部署到现实世界中的多机器人环境中。此外,该框架为研究人员提供了一个基准系统,以比较不同的强化学习算法的性能。我们使用不同类型的多代理深钢筋学习算法在离散和连续的动作空间中使用不同类型的多代理深钢筋学习算法来证明框架的通用性,可扩展性和能力。
translated by 谷歌翻译
低光视频增强(LLVE)是许多应用程序,例如拍摄和自动驾驶,是一项重要但艰巨的任务。与单图像低光增强不同,大多数LLVE方法都利用相邻帧的时间信息来恢复颜色并删除目标框架的噪声。但是,这些算法基于多帧对齐和增强的框架,在遇到极端低光或快速运动时可能会产生多帧融合工件。在本文中,受到低潜伏期和高动态事件范围的启发,我们使用来自多个帧的合成事件来指导低光视频的增强和恢复。我们的方法包含三个阶段:1)事件合成和增强,2)事件和图像融合,以及3)低光增强。在此框架中,我们分别为第二阶段和第三阶段设计了两个新型模块(事件图像融合变换和事件引导的双分支)。广泛的实验表明,我们的方法在合成数据集和真实LLVE数据集上都优于现有的低光视频或单个图像增强方法。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
Cognitive Computing (COC) aims to build highly cognitive machines with low computational resources that respond in real-time. However, scholarly literature shows varying research areas and various interpretations of COC. This calls for a cohesive architecture that delineates the nature of COC. We argue that if Herbert Simon considered the design science is the science of artificial, cognitive systems are the products of cognitive science or 'the newest science of the artificial'. Therefore, building a conceptual basis for COC is an essential step into prospective cognitive computing-based systems. This paper proposes an architecture of COC through analyzing the literature on COC using a myriad of statistical analysis methods. Then, we compare the statistical analysis results with previous qualitative analysis results to confirm our findings. The study also comprehensively surveys the recent research on COC to identify the state of the art and connect the advances in varied research disciplines in COC. The study found that there are three underlaying computing paradigms, Von-Neuman, Neuromorphic Engineering and Quantum Computing, that comprehensively complement the structure of cognitive computation. The research discuss possible applications and open research directions under the COC umbrella.
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
Robotic teleoperation is a key technology for a wide variety of applications. It allows sending robots instead of humans in remote, possibly dangerous locations while still using the human brain with its enormous knowledge and creativity, especially for solving unexpected problems. A main challenge in teleoperation consists of providing enough feedback to the human operator for situation awareness and thus create full immersion, as well as offering the operator suitable control interfaces to achieve efficient and robust task fulfillment. We present a bimanual telemanipulation system consisting of an anthropomorphic avatar robot and an operator station providing force and haptic feedback to the human operator. The avatar arms are controlled in Cartesian space with a direct mapping of the operator movements. The measured forces and torques on the avatar side are haptically displayed to the operator. We developed a predictive avatar model for limit avoidance which runs on the operator side, ensuring low latency. The system was successfully evaluated during the ANA Avatar XPRIZE competition semifinals. In addition, we performed in lab experiments and carried out a small user study with mostly untrained operators.
translated by 谷歌翻译
Compared to regular cameras, Dynamic Vision Sensors or Event Cameras can output compact visual data based on a change in the intensity in each pixel location asynchronously. In this paper, we study the application of current image-based SLAM techniques to these novel sensors. To this end, the information in adaptively selected event windows is processed to form motion-compensated images. These images are then used to reconstruct the scene and estimate the 6-DOF pose of the camera. We also propose an inertial version of the event-only pipeline to assess its capabilities. We compare the results of different configurations of the proposed algorithm against the ground truth for sequences of two publicly available event datasets. We also compare the results of the proposed event-inertial pipeline with the state-of-the-art and show it can produce comparable or more accurate results provided the map estimate is reliable.
translated by 谷歌翻译
Blockchain, also coined as decentralized AI, has the potential to empower AI to be more trustworthy by creating a decentralized trust of privacy, security, and audibility. However, systematic studies on the design principle of Blockchain as a trust engine for an integrated society of Cyber-Physical-Socia-System (CPSS) are still absent. In this article, we provide an initiative for seeking the design principle of Blockchain for a better digital world. Using a hybrid method of qualitative and quantitative studies, we examine the past origin, the current development, and the future directions of Blockchain design principles. We have three findings. First, the answers to whether Blockchain lives up to its original design principle as a distributed database are controversial. Second, the current development of Blockchain community reveals a taxonomy of 7 categories, including privacy and security, scalability, decentralization, applicability, governance and regulation, system design, and cross-chain interoperability. Both research and practice are more centered around the first category of privacy and security and the fourth category of applicability. Future scholars, practitioners, and policy-makers have vast opportunities in other, much less exploited facets and the synthesis at the interface of multiple aspects. Finally, in counter-examples, we conclude that a synthetic solution that crosses discipline boundaries is necessary to close the gaps between the current design of Blockchain and the design principle of a trust engine for a truly intelligent world.
translated by 谷歌翻译